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Abstract 

This paper deals with the problem of extracting the activity of indi- 
vidual neurons from multi-electrode recordings. Important aspects of 
this work are: 1) the sorting is done in two stages - a statistical model 
of the spikes from different cells is built and only then are occurrences 
of these spikes in the data detected by scanning through the original 
data, 2) the spike sorting is done in the frequency domain, 3) strict 
statistical tests are applied to determine if and how a spike should be 
classiffed, 4) the statistical model for detecting overlaping spike events 
is proposed, 5) slow dynamics of spike shapes are tracked during long 
experiments. Results from the application of these techniques to data 
collected from the escape response system of the American cockroach, 
Periplaneta americana, are presented. 
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1 Introduction 



Classical methods for exploring the mechanisms of brain function involve 
recording the electrical activity of single nerve cells. Much has been learned 
from this approach, but there are several reasons to go beyond single neuron 
recording [Rieke et al., 1997| . First, multineuron recording greatly increases 
the efficiency of studying the properties of single neurons. Second, record- 
ing simultaneously from many neurons allows access to the precise temporal 
relations among action potentials in multiple neurons Usrey et al., 1998| . 
This can provide a testing ground for the hypothesis that these temporal 
relationships carry significant information. Finally, multineuron recording 
experiments might give us a glimpse into the collective dynamics in neu- 
ral networks, e.g. the existence of multiple stable states and possibility of 
switching between them |Amit, 1989HAbeles and Gerstein, 1988| . 

This paper is concerned with one of the many technical problems that 
arise in trying to adddress the above questions, namely the problem of 
sorting out signals from multiple neurons as they appear on multiple elec- 
trodes IMcNaughton et al., 19831 |R 

eece and Q'Keefe, 1989| . In a multineu- 
ron recording each cell can appear on multiple electrodes and multiple cells 
can appear on each electrode. Finding the spike times for each cell is difficult 
because spikes from different cells are very similar in shape, making it hard 
to distinguish among them. In addition, the events which might be most 
interesting, synchronous spiking of nearby cells, are among the most difficult 
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to disentangle. 



2 The spike sorting problem 



Throughout the nervous system, cells generate stereotyped electrical pulses 
termed action potentials or spikes. Thus, in the absence in the absence of 
noise or variability, it is expected that each neuron in the system would 
generate a characteristic signal at each electrode. This waveform can be 
viewed as a single point in signal space. In the presence of noise, however, 
these discrete points spread into clusters of points, each point representing 
a single spike. One of the objectives of the spike sorting method described 
here is to make a model of these clusters that is accurate enough to allow 
the assignment of any given voltage waveform to one of the clusters. Even 
though it is unhkely that all the details of the cluster model are correct, our 
hope is that any such errors will not hamper spike classification. 

If V is the set of voltages measured on all of the electrodes during a 
short time segment, the probability distribution of these voltages may be 
decomposed as 



where P(c) is the total probability of observing a spike from cell c and P(V|c) 
is the distribution of voltage waveforms that arise from this cell. Conversely, 
if a set of voltage signals V is observed, the probabihty that it comes from 




(1) 



c 
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cell c can be derived from Equation (0) by applying Bayes' rule, 

Ideally one would like assignments which are certain. Statements of the form 
"probably a spike from cell 2, but maybe from cell 1," "probably a spike from 
cell 3, but maybe overlapping spikes from cells 4 and 5," defeat our purpose. 
To avoid this type of problem the distributions P(V|c) and P(V|c') should 
not overlap when c ^ c'. Furthermore, it is hoped that if these distributions 
overlap only slightly, a precise model for the distribution of each cluster may 
not be necessary and a small number of parameters may suffice to make 
reliable distinctions among the clusters. If a reliable model of this form can 
be made, then the assignment problem is solved. Note also that, in this limit, 
our prior assumptions about the likelihood of different neural responses plays 
no role. 

The approach to the clustering problem described here is as follows. First 
it is assumed that individual clusters P(V|c) have a Gaussian form in which 
each frequency component fluctuates independently. Next, the best possible 
clusters are found and the mean and variances at each frequency are calcu- 
lated for every cluster. If this Gaussian model were correct, the probability 
that a given cluster could generate a particular waveform would depend only 
on the distance between that waveform and the cluster mean, with ap- 
propriate weighting by the variances. These values provide a set of new 
dimensions along which distributions should be nonoverlapping if certainty 
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in the assignments has been achieved. In addition, it is possible to check if 
the clustering is self consistent because every member waveform of a given 
cluster will yield a distance of ~ 1 for that cluster and >> 1 for all of 
the other clusters. 

This combination of below a threshold for one cluster and above thresh- 
old for all other clusters is the signature of unambiguous assignment. The 
reader should note that we achieve this result despite the fact that the real 
clusters need not obey the assumptions of our simple model. This can be 
seen by looking closely at the form of the resultant x^ distributions. The im- 
plementation of these ideas takes on the following form. First, a statistical 
model of the spikes from different cells is built by clustering. Second, in a 
completely independent stage we detect the occurrence of these spikes in the 
data. This is initially done for single spikes. Finally, by superimposing these 
statistical models, overlaping spikes are treated as well. 

We emphasize once more that we are not proposing the Gaussian model 
as an exact model of the relevant probability distributions. On the contrary, 
the goal is to show that the Gaussian model suggests dimensions along which 
clusters are discriminable, and that the real clusters are almost perfectly dis- 
criminable along these dimensions. Once this is established, the precise form 
of the model is irrelevant. In cases where nonGaussian behavior is known 
to be crucial, one might start with different assumptions but the general 
strategy would be the same: we want to exhibit explictly the discriminabil- 
ity of clusters along some small number of relevant directions in waveform 
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space, and the starting model is just an aid to finding these dimensions. 
Similar ideas using a test were proposed in recent work by Pouzat et al. 
IFouzat, 2U02| . 

3 The spike sorting algorithm 

The first step is to build the set of clusters. This is done by identifying the 
different spike shapes in the data and constructing a statistical model for 
each recognized spike type. As with many other spike sorting algorithms 
the work described here is based on the following assumptions: a) different 
spikes from the same cell are very similar, b) spikes from different cells have 
different waveforms on at least one electrode, c) if a cell fires once, it fires 
many times and d) overlaps are fairly rare. 

3.1 Clustering 

Before the actual clustering can be done a large number of each of the dif- 
ferent spike types is needed. To do this the data is broken into short frames. 
The content of each frame is then examined to see if it contains "clean" spikes 
which are described below. A frame refers to a set of data snippets, one from 
each electrode at a given time. We emphasize that during this initial pass 
through the data we are interested only in collecting clean representatives of 
every spike type and no attempt is made to deal with frames that did not 
obviously contain only one single spike. 

The object of the clustering is to group similar spikes together. This must 
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work in spite of the fact that two similar spikes may be shifted shghtly in time 
|Lewicki, 1998| . To deal with this problem great care is given to accurately 
aligning the spikes during the clustering process. 

Objects to be clustered- While we measure time domain signals, from 
now on each frame, /, is represented by a vector composed of the concate- 
nated Fourier transforms of the voltage waveforms of the data from each of 
the iVe electrodes, i.e., 

Sf{^) = [Sf,l{uj),Sf^2{i^), ■ ■ -Sf^N^^)] . (3) 

The alignment of spikes in the frequency domain is achieved simply by mul- 
tiplying the Fourier components by e**^"^ where r is the necessary time delay 
(see below). It should be noted that although the work presented here deals 
with sorting concatenated spectra, other objects can be sorted as well. Ex- 
tensive experiments on sorting different objects were done while developing 
the methods described here but for our data Sf{u) proved to be the most 
useful. For example, reference [Rinberg et al., 1999| describes the sorting 
of transfer functions between electrodes which is independent of the spike 
shape. This may be of interest in cases where the spike shape can change 
[Fee et al., 1996b| {e.g. in bursting cells). In certain cases power spectra, 
which are invariant to time shifts, can be used as well. 

Description of the algorithm- The clustering algorithm is outlined in 
Box(l). Its various steps are described below. 

• Line 1: Initialization- First all frames are averaged yielding an av- 
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erage signal, 

Nf 

^ /=i 

and variance, 




where Nf is the number of frames, u> is the discrete frequency index 
and the indices /, c, and e refer to the frame, cluster and electrode 
respectively. For this initial averaging the time shifts, r/, described 
below, are set to zero for all frames. 

• Loop 2: Split clusters- In every pass, except the first, each cluster 
is split in two using a small random vector (5c,e, i-G., Sc^e ^ Sc^^ =t 
^c,e- This is repeated until some criterion is met. Establishing this 
criterion proved to be a difficult problem to solve generally. An a priori 
knowledge of the expected number of cells was found to be the most 
reliable criterion for stopping the cluster splitting. 

• Loop 3: Reassign frames- This loop executes an expectation maxi- 
mization algorithm. The average and variance at each frequency com- 
ponent of every cluster are calculated using the appropriate time delay 
for every member frame calculated against each cluster. The time delay 
Tf that minimizes Xf,ci^f) calculated against all clusters. The frame 
is then reassigned to the cluster that matches it most closely, i.e., the 
one that yields the minimum 
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Xf^ci'^f)- This is done until the frames are distributed in such a way 
that these parameters no longer change [Papoulis, 1965| . 

• Line 4: Finalize clusters- After the clustering ends some clusters 
may actually be identical within statistical error. These are merged 
into a single cluster. Others might clearly contain frames of different 
types and are split into two clusters. Still others might contain frames 
that clearly do not contain spikes or perhaps contain very few spikes. 
These are discarded. 

3.2 Detection 

The end result of the clustering phase is a statistical template for all of the 
spike types found in the data. In the next and final phase the data is scanned 
to find all occurrences of each spike type. The basic idea of the detection is 
to cut the data into short frames and determine which cluster best describes 
the data in that frame and its precise timing. Spikes that are well centered 
in the frames are detected and subtracted from the data. The data is then 
reframed and the process is repeated until no single spikes remain in the 
data. Finally, this process is repeated for overlaps. This process is shown 
schematically in Figure ((H). 

Single spike detection- Single spike detection is done by finding the 
(c, Tf) pair that minimizes 
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for every frame. In practice Equation is expanded into three terms, 

x}ic,Tf) = Aic) + Bic) + 2Cic,Tj), (7) 
A{c) = J2^^^\ScA^)\^ (8) 

Cic^Tf) = Y^^—Re{Sa^)SlM ■ (10) 

— e J 

oj.e ' 

yl(c) can be calculated in advance once the frame clustering has been done. 
B{c,Tf) and C{c,Tf) are calculated in the detection algorithm, outlined in 
pseudo-code in Box (2). The main ideas are described below. 

• Line 1: Frame data- The complete data set is broken up into nonover- 
lapping equal sized frames. 

• Loop 2: Process each frame- It is assumed that a frame can contain 
either noise (see next section), a spike, part of a spike or an overlap 
of 2 spikes. Frames containing noise are discarded leaving only those 
containing single or multiple spike events. 

• Loop 3: Check fit to each cluster- Here the chosen frame is com- 
pared to each cluster. Since the spike in the frame may not be centered 
it is necessary to align the frame to the cluster. For each frame the 
time delay, r/, that maximizes the cross correlation term, C{c,Tf), is 
found for each cluster. 
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• Line 4: Calculate xfj,(Tf,c) for all clusters- If the spike is not 
near the edge of the frame, i.e., |t/| < Tth, then B{c,rf) is calculated 
yielding the final term in Equation (jU)). All of the terms of xjci'^fTc) 
have now been calculated yielding an estimate of the similarity of the 
frame to each of the clusters. 

• Line 5: Finalize spike detection- The cluster that yields the small- 
est value for Xii'^^c) is the cluster most similar to the frame being 
tested. It is not enough to find the cluster for which is smallest. 
A fit is accepted only if x^(r, c) < Xth- Frames that were not good 
matches to any cluster most likely contained an overlap, noise or a par- 
tial spike that will most likely be found when the frames are shifted in 
a later pass through the data. 

Multiple spike (overlap) detection- Once all of the single spikes have 
been detected and removed, overlapping spikes are detected and removed as 
well. Here we generalize the single spike case to two spikes, thus, we look for 
the set of (ri, r2, Ci, C2) that minimizes 

(11) 

The algorithm used to find the clusters and time delays is very similar to 
that used for the single spike case with some differences. First, the minimiza- 
tion executed on line 3 of Box (2) is over two variables tj^i and tj 2- This 
is computationally intensive but since there are a finite number of possible 
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time delays it remains manageable on a personal computer. As before many 
of the calculations can be performed once the cluster centers are known. 

Another issue which needs to be addressed when dealing with two spikes 
is the calculation of cr^^ e(^)- Here we assume that there is an inherent noise 
in the spike shape and an additive background noise. Thus the variances are 
given by 

where (t'^{oo) is the background noise computed from regions devoid of spikes. 
While this assumption is not true in general it is reasonable since the vari- 
ances of the different clusters are assumed to be independent. It proved to 
be a reliable working model. This measure of the two-spike variance takes 
into account the contribution of each cluster but does not overcount the 
background noise. 

4 Application to multi-electrode data 

The techniques described above were applied to data recorded from the 
escape response system of the American cockroach Periplaneta americana 
ICamhi and Levy, 19891 |Kolton and Camhi, 19951 [Liebenthal et al., 19941 |Westin et al., 19771 



Camhi, 1984| . Neural activity was recorded using 8 hook electrodes attached 



to the two bilateral abdominal connectives. In this arrangment each electrode 
measures a weighted sum of the activity from the different neurons in the 
connective. A more detailed description of the experimental setup is given 
in reference [Rinberg and Davidowitz, 2002| . Typical experiments lasted for 
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several hours and yielded about 8 GB of raw data (see Figure (j2I)). The aim 
of the work described here is to unravel the individual spike times from this 
multi-electrode data. 

4.1 Statistical model of the clusters 

Here we describe the application of the clustering algorithm described in 
Section ()3.H) to the multielectrode cockroach data. 

Frame selection- Only frames that fulfill the following criteria were col- 
lected to produce a model of the clusters: a) the signal in the middle of the 
frame is above some threshold, v^"^, on at least one electrode and b) the 
signal at the frame edges is less than some other threshold, t'j^^^, on all of 
the electrodes. This idea is illustrated in Figure Q. 

The thresholds were proportional to the background noise levels. This 
background signal, v;,, was the average of several averages. 



calculated from regions devoid of spikes at the beginning, middle and end 
of the experiment (usually silences between trials when no stimulus was pre- 
sented). Here Vi are the data points while the number of data points was 
typically about Np = 5 - 10"^. The thresholds described above were defined as 
'^th'^ = 4 ■ Vf, and f^^^*^ = 1.5 ■ Vb- These thresholds worked well for our data 
but will likely be different for data from other experiments. 

While the choice of thresholds is to some extent arbitrary, there are several 
guidelines that can be followed. If the threshold is set too high low energy 




(13) 
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spikes will be missed. If it is set too low, a large cluster containing noise 
will appear. We have checked that lowering the thresholds does not influence 
the larger amplitude clusters identified by our algorithm. The setting of the 
threshold is thus a compromise between identifying all the small amplitude 
spikes and minimizing computing time. 

To avoid missing spikes because of voltage drifts or overlapping tails of 
nearby spikes, the frames are first detrended by subtracting a linear fit to 
the first and last 0.5 ms of each frame. 

Time shift between electrodes- Spikes appear on each electrode at dif- 
ferent times because of the finite propagation time of the action potentials 
along the neurons in the connective. This is evident in the data shown in 
Figure (jHI). Frames are thus defined with time delays between the different 
channels. A further complication arises from the fact that the time delays on 
the different electrodes may be different for different cells. An average delay 
was found by calculating the time of the maximum of the cross correlation of 
voltage traces from the different electrodes. Because these delays can change 
during the course of an experiment these inter-channel delays were in turn 
averaged from widely separated segments of data. Typically, these delays 
were between 0.1 and 0.8 ms depending on the positioning of the electrodes. 

Frame size- To keep calculation times short, a small frame size is desir- 
able. On the other hand, the frames have to be big enough to account for 
the different propagation times of the different cells. For the experiments 
described here frames were 3.2 ms long (64 data points). 
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The clustering begins after 10,000 frames containing candidate spikes 
have been collected. While each frame is represented by the vector defined 
in Equation Q only the low frequency components (the first 16 complex 
numbers) of each Fourier transform were used because the higher frequencies 
were found to be indistinguishable from noise. Thus Sf^ei^^) was of dimension 
^ ■ 32 ■ Nf, complex numbers, where N^. is the number of electrodes. 

Since about 7 cells are expected on each side the clustering is stopped 
once 16 clusters have been found on a side. Up until this point the clus- 
tering is automatic. Some results of this automatic clustering are shown in 
Figure (jlj. Intervention is now needed to refine the clustering. To do this 
the distributions of the clusters are examined manually. Clusters are then 
merged, split or discarded as necessary. This process could be automated, 
but was not deemed worth the effort. Results of the clustering after the final 
manual intervention are shown in Figure (jSj) and Figure (jH)). 

Track slow changes- Thus far, the first ~ 10^ frames have been used to 
establish the statistics of each cluster. The accuracy of the spike detection 
can be improved if the change in spike shapes are tracked during the course 
of a long experiment. For the cockroach experiments described here, the 
trial period of 100 s was chosen as the time scale upon which changes are 
tracked. To do this the cluster statistics Sc,e{^) and (Jc^ei^^) are updated 
with consecutive 100 s segments of data on a first-in, first-out basis. Once 
the cluster statistics have been computed, the next "clean" frames from 
the data are found and are each assigned to the cluster that is closest based 
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on the distance defined in Box (1). These frames are then added to these 
clusters while the same number of the earliest frames are removed. As usual, 
frames that exceed a certain for all clusters are discarded. The cluster 
statistics are then recalculated, yielding dynamic clusters of time dependent 
spike statistics Sc,e{(^,t) and ac,e{^^,t) that are used as the templates for the 
spike detection described in the next section. Results of spike shape tracking 
are shown in Figure ((Tj). 

4.2 Detection 

For the detection phase the complete data set is broken into nonoverlapping 
frames containing 64 data points (3.2 ms). In practice, B{c) is calculated only 
after zero padding the frame on the outside 2 ms. This is done to eliminate 
the possibility of a nearby spike, with an overlapping tail, causing xj^d'^fj c) 
to be too large. C(c, tj) does not need to be recalculated because the clusters 
themselves are averages of many clean spikes and therefore inherently zero 
padded. 

A fit is accepted only if x^(r, c) < Xth which was determined by the shape 
of the distribution. Typically Xth ~ 2, though this threshold would likely be 
different for other data. The probability of detecting a spike is smaller if it 
overlaps with the tails of nearby spikes. This problem can be greatly reduced 
if, once spikes have been classified, the average spike from the corresponding 
cluster is subtracted from the data (see Figure ((H)). Every pass through the 
data leaves a smaller number of unclassified spikes. After each pass the data 



17 



is refrained with a time shift of | frame and the process is repeated. Ideally, 
after 4 passes all single spike events have been detected. 

Multiple spike (overlap)detection- Once all of the single spikes have 
been detected and removed, overlapping spikes are detected and removed as 
well. Figures (jHl and ^ show the results of describing an overlap as single 
and double spike events. Notice that the automatic recognition of an event 
as being a two spike event is dependent on being able to reject the "best" 
single spike description of the event. Thus, in Figure (jHl), the best single 
spike description is quite good, and the identification of a spike from cell (b) 
is correct, but the value of = 3.5 is unacceptably high, as can be seen 
from the distributions. Once we explore the space of two spike events we 
find a unique description with ~ 1 (see Figure 0). 

5 Discussion 

The spike sorting method described here has several key features. First, 
the experimental design is such that the full waveforms from all electrodes 
are recorded continuously during the course of an experiment. This is an 
advantage over many spike sorting techniques that are based on the idea 
of feature clustering (see [Lewicki, 1998| for a recent review of spike sorting 
techniques as well as an older review in [Schmidt, 1984| ). In feature clustering 
one or more features of the spikes (spike height, peak to peak amplitude, rise 
times, etc.) are extracted and clustered. Since only a few features of the 
spikes are used, subtle differences between spikes from different cells can be 
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lost. In addition, it is not a priori clear which, and how many, features to 
use — are there better (optimal) ones? When applied properly this technique 
can work well for a small number of cells but does not work well with many 
cells. 

Another important feature of the present technique is that the construc- 
tion of models for the spike shapes from individual cells (clustering) is sepa- 
rated from the problem of finding those spikes in the data (detection). The 
advantage here is that one has the leisure to look for clean examples of 
each spike type and thus to build a good statistical model of each spike type. 
Only then does one look for the occurrence of these spikes in the data. These 
models are time dependent in the sense that they track the change in spike 
shape during the course of an experiment. This has traditionally been a 
problem in template-matching spike sorting techniques in which a model of 
the spike shapes are constructed [Bergman and DeLong, 1992| . In this tech- 
nique, these models are compared to a given spike and a decision is made as 
to whether it belongs to the class defined by this template or not. One of the 
more advanced implementations of this technique [Lewicki, 1994| works well 
in many cases but relies heavily on a Gaussian model of the noise. Another 
technique [Fee et al., 1996a] that makes no assuptions about the waveform 
noise uses the spike shapes as well as refractory period statistics to classify 
cells. It works well with bursting cells but in its present form this technique 
does not treat overlaps well. This is another advantage of the separation 
of clustering from detection: it simplifies the problem of overlap detection 
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which has traditionally been one of the most difficult parts of the spike sorting 
problem. 

Many comparisons and computations that go into the sorting process 
are much easier when working in the frequency domain. The ease in which 
temporal alignment is achieved is one of the advantages of performing the 
detection in the frequency domain since sub-bin time shifts in the time do- 
main would require resampling [Lewicki, 1998| . In the frequency domain 
this is done simply by multiplying the Fourier components by e*"^"^ which is 
equivalent to a time shift of r in the time domain. In addition, it is even 
possible to sort spikes from bursting cells (in which spike shapes can change 
drastically over short time scales) by ignoring the spike shapes entirely and 
sorting on transfer function ratios [Rinberg et al., 1999| . Since this is an 
independent type of information it can be used to check the validity of sort- 
ing results. Sorting on the full Fourier transforms of the voltage waveforms 
yielded excellent results for our data but other investigators will likely find 
other combinations that work better for their data. 

The statistical methods used in this sorting program allow us to decide 
whether the clustered spikes really are discriminable. In addition, this sta- 
tistical analysis allows us to develop more rigorous criteria for accepting or 
rejecting the possible detection of a spike. 

Finally we note that in a typical experiment ^ 400, 000 spikes are found. 
Of these 90% were found with single spike detection, 8% were found with 
overlap detection only 2% were events that could not be classified. 
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6 Summary 



In this paper the problem of unravehng multielectrode neural data has been 
addressed. Special attention has been paid to the detection of overlapping 
spikes which poses obvious difficulties for any sorting method. The problem 
of spike sorting has been separated into two independent parts. First, a sta- 
tistical model of all possible spikes found in the data is constructed. Only 
then are these spikes detected in the data using strict statistical criteria to 
quantify the quality of this detection. Overlaps are dealt with after all pos- 
sible single spike events have been detected. These techniques were applied 
to multielectrode data from the American cockroach with good results. 
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1 


set Tf := and calculate initial S'c,e(<^) and a^d^) 


2 


repeat split clusters 


3 


repeat update frame assignment 




find average and variance for each cluster using r/ 




' ^ ' /\/ ^^^^^^ %i I ^ ' 








for every frame, /, find t/ that minimizes 




uj,e ^ ' 




assign frame to the cluster that yields smallest X/,c('''/) 




until clusters are stationary 




until number of clusters > expected number of cells 


4 


finalize clustering by merging, splitting or removing clusters 


5 


repeat track slow changes in spike shapes 




find spikes in next time chunk of data and update cluster statistics 




until end of data is reached 


Box 1: 


Pseudo code outline of the frame clustering algorithm. See text for 



details. 
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1 Frame whole data set with no overlaps 

2 repeat get next frame, / 

if frame has signal above background then 

3 repeat for each cluster, c 

find time delay, t/, that maximizes 



C(c,r/) 



uj,e ^'^^ ' 



if |r/| < Tth then zero pad frame and calculate X/(t/,c) 
until all clusters checked 

find cluster, c, that yields smallest X/,c(''7'^) 
if c) < Xt^j keep this spike and subtract it from data 

until end of data reached 



Box 2: Pseudo code outline of the single spike detection algorithm. See text 
for details. 
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Figure captions 



Figure 1: A schematic outline of the spike detection algorithm. Spike events 
are labeled a...f and are shown for only one electrode and only two cells for 
the sake of clarity. The detection procedure progresses from top to bottom 
yielding the spike times of two cells shown in bottom of the diagram. The 
rectangular boxes represent the frames. Note that a spike near a frame edge 
is not detected until the framing has shifted enough to more or less center it. 
Overlaps are not detected until all of the single spikes have been removed. 
In the work described here, both the single spike detection and the overlap 
detection processes consisted of 4 passes each. See text for more details. 



Figure 2: Sample of the raw data used in the spike sorting. Typically, signals 
from 4 electrodes were recorded on each side of the abdominal connective 
along with 2 stimulus channels (not shown). Three channels from the right 
connective are shown. Data was recorded at 20 kS/s, 16 bits per channel. 



Figure 3: Example showing the framing of the data from three electrodes 
(1-3). Grey boxes show frames that have been found to contain a candidate 
spike. 10,000 such frames are collected to generate a statistical model of 
the spikes. Each white box contains a frame that has been rejected for one 
or more of the following reasons: a) they contain noise, b) they contain an 
overlap or c) their energy is too high at the frame edges. Note that in this 
data there is a time shift in the appearance of the same spike on the different 
electrodes. This is a consequence of the spike propagation velocity but is not 
a necessary condition for the spike sorting techniques described here. 
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Figure 4: Results of the automatic frequency domain spike sorting showing 
four possible situations. Each row corresponds to one cluster. The center 
columns (labeled 1-4) show the spike shapes as they appear on different 
electrodes. Each trace shows 30 randomly chosen spikes from each cluster. 
The left column shows the average spectra of the signal from electrode 2 
with the corresponding variances. Ngp is the number of spikes found in 
each cluster. The right column shows the distributions of the distance 
of all spikes to this cluster center. The shaded area (thin lines) shows the 
distributions for member (nonmember) spikes. Row 1 shows a good cluster 
clearly separated from the others. Rows 2 and 3 show clusters that overlap 
each other and should be merged. Row 4 shows a small cluster with no clear 
center which will be deleted. Its members will be reassigned to other clusters. 

Figure 5: Resulting clusters after manual intervention. Some clusters from 
Figure (jH) have been joined, others have been deleted and still others have 
been split, depending on the distribution resulting from the automatic 
sort. Note the clear separation of the distributions. 



Figure 6: Cluster distribution at one frequency component in the complex 
plane. The circles correspond to a distance of 2a from the center. Even 
though the clouds partially overlap for these components, they are well sep- 
arated in multidimensional space. 

Figure 7: The evolution of the spike shape over the course of an experiment, 
shown for a single cluster. 

Figure 8: Attempt to describe an overlap of two spikes as a single spike event. 
The dotted traces at the top are the actual raw data of an overlap recorded 
on 4 electrodes. The middle traces show four clusters a-d. The bottom traces 
show calculated for different time shifts. The minimum x"^ is 3.5 found 
by fitting cell b with a time shift, r ~ ms. The thin line traces at the top 
of the figure show this fit. 
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Figure 9: Same as Figure (jHI) except that a fit is attempted using two cells. 

is now a function of two time shifts, ti and T2. The panels on the bottom 
show as a function of these two time shifts. The circles are centered at 
the minimum of for each combination of cells. The smallest ^^is found 
for cells b and d. This fit is plotted as a thin line at the top of the figure. 
Note that this fit is considerably better that the best fit of any single spike 
event. 
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